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Growth is a fundamental process of life. Growth requirements are well-characterized experimen- 
tally for many microbes; however, we lack a unified model for cellular growth. Such a model must be 
predictive of events at the molecular scale and capable of explaining the high-level behavior of the 
cell as a whole. Here, we construct an ME-Model for Escherichia coli — a genome-scale model that 
seamlessly integrates metabolic and gene product expression pathways. The model computes 
~80% of the functional proteome (by mass), which is used by the cell to support growth under a 
given condition. Metabolism and gene expression are interdependent processes that affect and 
constrain each other. We formalize these constraints and apply the principle of growth optimization 
to enable the accurate prediction of multi-scale phenotypes, ranging from coarse-grained (growth 
rate, nutrient uptake, by-product secretion) to fine-grained (metabolic fluxes, gene expression 
levels) . Our results unify many existing principles developed to describe bacterial growth. 
Molecular Systems Biology 9: 693; published online 1 October 2013; doi:10.1038/msb.2013.52 
Subject Categories: metabolic and regulatory networks; computational methods 
Keywords: gene expression; genome-scale; metabolism; molecular efficiency; optimality 



Introduction 

The genotype-phenotype relationship is fundamental to 
biology. Historically, and still for most phenotypic traits, this 
relationship is described through qualitative arguments based 
on observations or through statistical correlations. Under- 
standing the genotype-phenotype relationship demands van- 
tage points at multiple scales, ranging from the molecular to 
the cellular. Reductionist approaches to biology have produced 
'parts lists', and successfully identified key concepts (e.g., 
central dogma) and specific chemical interactions and 
transformations (e.g., metabolic reactions) fundamental to 
life. However, reductionist viewpoints, by definition, do not 
provide a coherent understanding of whole cell functions. 
For this reason, modeling whole biological systems (or 
subsystems) has received increased attention. 

A number of modeling approaches have been developed to 
predict systems-level phenotypes. What distinguish these 
models from each other are the underlying assumptions they 
make, the input data they require, and the scope and precision 
of their predictions (Selinger et al, 2003) . The type of modeling 
formalism employed is influenced by all of these distinguish- 
ing characteristics (Machado et al, 2011). Genome-scale 
optimality models of metabolism (termed as M-Models) have 
made much progress in recent years as they require only basic 
knowledge of reaction stoichiometry, are genome-scale in 
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scope, and have fairly accurate predictive power. Recently, 
M-Models have been extended to include the process of gene 
expression (termed as ME-Models) (Lerman et al, 2012; Thiele 
et al y 2012), opening up completely new vistas in the 
development of microbial systems biology. On the heels of 
these developments, a whole-cell model (WCM) of the human 
pathogen Mycoplasma genitalium appeared (Karr et al, 2012) . 
The WCM integrates many more cellular processes and can be 
used to simulate dynamic cellular states; however, it depends 
on detailed molecular measurements of an initial state (e.g., 
growth rate, biomass composition, and gene expression). 
While the model described by Karr et al is a major advance 
toward whole-cell computation, many practical applications 
rely on the ability to compute optimal phenotypic states. The 
WCM does not have this ability owing to the disparate 
mathematical formalisms it employs. The WCM and gen- 
ome-scale optimality models thus have different capabilities 
and will find use to predict and explain different biological 
phenomena. 

Here, we construct an ME-Model for E. coli K-12 MG1655. 
The ME-Model is a microbial growth model that computes the 
optimal cellular state for growth in a given steady-state 
environment. It takes as input the availability of nutrients to 
the cell and produces experimentally testable predictions for: 
(1) the cell's maximum growth rate (u*) in the specified 
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environment, (2) substrate uptake/by-product secretion rates 
at u*, (3) metabolic fluxes at u*, and (4) gene product 
expression levels at u* . The creation of this model required the 
development of a new modeling formalism and optimization 
procedure to couple gene expression with metabolism, which 
provided new insight into growth rate- and nutrient limitation- 
dependent changes in enzymatic efficiency. The model 
predicts three distinct regions of microbial growth, defined 
by the factors (nutrient and/or proteome) limiting growth. We 
show that proteomic constraints improve predictions of 
metabolism itself, rectifying dominant failure modes in 
M-Models. Finally, we compute gene expression changes as 
the cell transitions through and between the different growth 
regions. The ME-Model computes measurable coarse- and 
fine-grained cellular and molecular phenotypes, and provides 
unity in the field by reconciling a variety of principles related to 
cellular growth at various scales of complexity. 



Results 

Integration of genome-scale reaction networks of 
protein synthesis and metabolism 

To create an ME-Model for E. coli, we started from two 
previous network reconstructions. The first reaction network 
includes all known metabolic pathways as of late 2011 (Orth 
et al, 2011) and is referred to as the M-Model throughout. The 
second accounts for reactions that describe gene expression 
and the synthesis of functional macromolecules in a mechan- 
istically detailed manner (Thiele et al, 2009) . The two reaction 
networks were integrated (see Materials and methods), 
and reactions and gene functions in both networks were 
updated to reflect gaps in knowledge that have been filled 
since their creation. We updated subunit stoichiometries for 
hundreds of multiprotein complexes and expanded the types 
of prosthetic groups, cofactors, and post-translational 
modifications required for catalytic activity (Materials and 
methods; Supplementary Table SI). 

The scope and coverage of cellular processes in the 
integrated network is extensive. The integrated network 
mechanistically links the functions of 1541 unique protein- 
coding open reading frames (ORFs) and 109 RNA genes; it thus 
accounts for -35% of the 4420 protein-coding ORFs, -65% 
of the functionally well-annotated ORFs (Riley et al, 2006) , and 
53.7% of the non-coding RNA genes identified in E. coli K-12 
(Keseler et al, 2013). In total, 1295 unique functional protein 
complexes are produced. Taken together, these complexes 
account for 80-90% of E. coWs expressed proteome by mass 
(Supplementary Table S2) . 



The integrated reaction network covers and accurately 
predicts a large proportion of essential cellular functions. 
It includes 223 of the 302 (73.8%) genes classified as 
essential for cell growth under any condition (Kato and 
Hashimoto, 2007) (Supplementary Table S3 A), and 166 of the 
206 functions (80.6%) estimated as essential for a minimal 
organism (Gil et al, 2004) (Supplementary Table S3B). 
In silico prediction of gene essentiality in glucose 
M9 minimal media results in an accuracy of 88.8% 
(precision = 60.4%, recall = 75%, Supplementary Table S4). 
One of the dominant failure modes of essentiality predictions 
is due to the assumption that all tRNA and rRNA modifications 
are essential; removing these genes from predictions increases 
performance notably (accuracy = 92.3%, precision = 75.3%, 
recall = 75%, Supplementary Table S4). This accuracy 
is on par with previous approaches using the metabolic 
reaction network alone (accuracy = 91.2%, precision = 81 %, 
recall = 68%) (Orth et al, 2011). Many of the key differences 
between the M-Model and the ME-Model essentiality predic- 
tions are due to the mechanistic treatment of cofactor and 
prosthetic group synthesis and utilization in the ME-Model. 
Specifically, for a protein complex to be functional in the ME- 
Model it has to contain the embedded prosthetic groups 
required for function; while this change in model structure 
results in some false predictions of essentiality compared with 
M-Models (which include all prosthetic groups in a biomass 
objective function that does not change across conditions), the 
essentiality predictions in the ME-Model can be directly related 
to the essential enzymes requiring the prosthetic group. 



Growth demands and general constraints on 
molecular catalysis 

To compute functional states of the integrated network, growth 
demands are first imposed. Growth requires the replication of 
the organism's genome and synthesis of a new cell wall to 
contain the replicated DNA. In the ME-Model, growth rate- 
dependent DNA and cell wall demand functions formalize 
these requirements (Figure 1A; Supplementary information). 
We derived these demand functions from growth rate- 
dependent trends in cell size (Donachie and Robinson, 1987) 
and DNA content (Meyenburg and Hansen, 1987; Bremer and 
Dennis, 1996) (Supplementary information). In addition, as in 
previous models, we imposed growth-associated and non- 
growth-associated ATP utilization demands (Pirt, 1965) as the 
ostensible energy requirements (Neijssel et al, 1996; Zhuang 
et al, 2011). 

One large improvement is that RNA and protein are not 
included as demand functions (as they are in M-Models; Feist 



Figure 1 Growth demands and coupling constraints leading to growth rate-dependent changes in enzyme and ribosome efficiency. (A) Three growth rate-dependent 
demand functions derived from empirical observations determine the basic requirements for cell replication (detailed in Supplementary information). (B) Coupling 
constraints link gene expression to metabolism through the dependence of reaction fluxes on enzyme concentrations. (C, D) RNA:protein ratio predicted by the ME- 
Model with two different coupling constraint scenarios, one for variable translation rate versus growth rate (red lines) and one for constant translation rate (orange lines). 
Experimental data in (C) obtained from Scott et al (2010). (E) Phosphotransferase system (PTS) transient activity following a glucose pulse in a glucose-limited 
chemostat culture (red) and glucose uptake before the glucose pulse (blue) is plotted as a function of growth rate. The data shown were obtained from O'Brien etal 
(1 980)). Data from |i > 0.7 h ~ 1 were omitted. (F) Data from (E) are used to plot glucose uptake as a fraction of PTS activity. The resulting value is the fractional enzyme 
saturation (black line). The fractional enzyme saturation predicted by the ME-Model is plotted as a function of growth rate under carbon limitation (red dots). (G) The 
cartoon depicts changes in extra- (blue) and intra- (green) cellular substrate (circle) and product (triangle) concentrations and metabolic enzyme (orange) and ribosome 
(purple/maroon) levels as the concentration of a growth-limiting nutrient (and growth rate) increases. The dials show k e ^/k cah the effective catalytic rate over the 
maximum for metabolic enzymes (orange) and ribosomes (purple/maroon). 
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and Palsson, 2010); instead, expression of specific RNA and Lerman et al, 2012) relate the synthesis of RNA- and protein- 
protein molecules are free variables determined during ME- based molecules to their catalytic functions in the cell 
Model simulations. 'Coupling constraints' (Thiele et al, 2010; (Figure IB). The coupling constraints are based on parameters 
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that define the effective catalytic rate (/c eff ) and degradation 
rate constant (fc deg ) of molecular machines (Supplementary 
information) . 

A nutritional environment is then defined by setting 
constraints on the availability and uptake of nutrients. For a 
particular nutritional environment, there is a maximum 
growth rate at which the cell can no longer produce enough 
RNA and protein machinery to meet the demands of growth. 
The computed cellular state (biomass composition, substrate 
uptake and by-product secretion, metabolic flux, and gene 
expression) at this maximum growth rate is the predicted 
optimal response of the cell to the specified nutritional 
environment. 



Derivation of constraints on molecular catalytic 
rates 

Previous studies disagree as to if ribosomes translate with the 
same efficiency (amino acids per ribosome per second) across 
growth conditions (Young and Bremer, 1976; Scott et al, 2010). 
Here, we use the ME-Model and available data to determine an 
appropriate constraint for ribosomal efficiency as a function of 
growth rate. We find that if a constant translation rate of 20 
amino acids per second is imposed as a constraint in the ME- 
Model, the model predicts a linear growth rate-dependent 
RNA-to-protein ratio (Figure 1C), consistent with the previous 
measurements (Scott etal, 2010); however, the predicted RNA 
content does not quantitatively match measured values. In 
particular, a constant translation rate results in no RNA 
production in the limit of no growth. We therefore hypo- 
thesized that ribosomal translation rate systematically varies 
with growth rate, and back-calculated a growth rate-depen- 
dent translation rate using measured growth rate-dependent 
RNA content (Supplementary information). Ultimately, we 
recovered a Michaelis-Menten-type rate law (Figure ID) with 
a maximal rate (V max ) of ~20 amino acids per second, 
consistent with previous findings for maximal ribosomal 
speed (Bremer and Dennis, 1996); the rate law results in a 
quantitative match of RNA content compared with experi- 
mental data (Figure 1C, Pearson's r = 0.96). This rate law 
causes translation efficiency to increase under nutrient-richer 
conditions, which recent experimental evidence supports 
(Proshkin et al, 2010; Valgepea et al, 2013). Interestingly, 
when we applied the same Michaelis-Menten-type equations 
to constrain tRNA and mRNA catalytic rates, we recovered 



maximal turnover rates highly consistent with previous 
estimates (Supplementary information) . 

The catalytic rates of metabolic enzymes are variable as 
well, and tend to decrease when nutrients are limited. Both 
metabolomics (Boer et al, 2010) and proteomics (Valgepea 
et al, 2013) data sets suggest a large-scale scaling of enzyme 
efficiencies under nutrient limitation. We approximate these 
changes in metabolic catalysis in the ME-Model with two 
minimal assumptions: (1) when the cell is nutrient-limited, 
protein content is maximized (at a given growth rate) and 
(2) this protein content specifically is metabolic enzymes not 
operating at their maximal catalytic rate (Valgepea et al, 
2013) (i.e., k ei f/k CSLt <l, see Figure 1G and Supplementary 
information, Optimization procedure). These two assump- 
tions allow us to predict average catalytic rates of metabolic 
enzymes under nutrient limitation. The nutrient limitation- 
dependent shape of our computed catalytic rates matches 
assays for glucose transporters under glucose limitation 
(O'Brien et al, 1980) (Figures IE and F), LacZ under lactose 
limitation (Smith and Dean, 1972) (Supplementary 
Figure SI A), and the enzyme efficiency in a small-scale 
optimality model accounting for substrate concentrations 
with Michaelis-Menten kinetics (Molenaar et al, 2009) 
(Supplementary Figure SIB). However, because the current 
ME-Model simulation procedure assumes that k eit decreases 
uniformly across metabolism, the model does not capture the 
importance of specific enzymes for particular nutrient limita- 
tions; recent data sets (Valgepea etal, 2013) and kinetic models 
(Kim et al, 2012) can help us understand and model these 
trends better at the genome-scale. 



Growth regions under varying nutrient availability 

Upon derivation of the growth demands and molecular 
efficiencies, we investigate high-level model behavior to 
variable nutrient availability. Unlike previous genome-scale 
models (Orth et al, 2011; Thiele et al, 2012), growth rate in the 
ME-Model is a non-linear function of the substrate uptake rate 
bound (Figure 2A), and eventually reaches a maximum. This 
behavior is consistent with long-standing empirical models of 
microbial growth (Monod, 1949; Koch, 1997), in which growth 
is first nutrient-limited, but then limited by some intra- 
organismal bound. 

Under nutrient-excess conditions, growth in the ME-Model 
is limited by internal constraints on protein production 



Figure 2 Predicted growth, yield, and secretion. (A) Predicted growth rate is plotted as a function of the glucose uptake rate bound imposed in glucose minimal media. 
Three regions of growth are labeled Strictly Nutrient-Limited (SNL), Janusian, and Batch (i.e., excess of substrate) based on the dominant active constraints (nutrient 
and/or proteome limitation). The proteome-activity constraint inherent in the ME-Model results in a maximal growth rate and substrate uptake rate. The behavior of a 
genome-scale metabolic model (M-Model) is depicted with an arrow. (B) Predicted growth rates as a function of uptake of a limiting nutrient with glucose in excess. The 
shaded regions correspond to those as labeled in (A). (C) Experimental (triangle) and ME-Model-predicted (circle) acetate secretion in Nitrogen- (blue) and Carbon- (red) 
limited glucose minimal medium are plotted as a function of growth rate. Data were obtained from Zhuang ef a/(2011). The root-mean-square error (RMSE) between 
data and the ME-Model is 0.12 (for comparison, RMSE = 0.40 for the M-Model). (D) Experimental (triangle) and ME-Model-predicted (circle) carbon yield (gDW 
Biomass/g Glucose) in Carbon- (red) and Nitrogen- (blue) limited glucose minimal medium are plotted as a function of growth rate. Data were obtained from Zhuang etal 
(2011). RMSE between data and the ME-Model is 0.04 (for comparison, RMSE = 0.07 for the M-Model). (E) The cartoon depicts changes in extra- (blue) and intra- 
(green) cellular substrate (circle) and product (triangle) concentrations and metabolic enzyme (blue/orange) and ribosome (purple/maroon) levels during the Janusian 
region. Metabolic enzymes are saturated throughout the entire Janusian region. To increase the growth rate, the cell expresses metabolic pathways that have lower 
operating costs. (Pathways with the smaller blue proteins taken to be 0.25 the cost of the pathways with larger orange proteins.) A higher glucose uptake and turnover 
results, but energy yield is lower and some carbon is 'wasted' and secreted (brown triangles). The dials show k e ^/k cat , the effective catalytic rate over the maximum for 
metabolic enzymes (blue/orange) and ribosomes (purple/maroon). 
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and catalysis — the cell is 'proteome-limited' — resulting in a 
corresponding maximal growth rate (Figure 2A). This 
feature allows Batch culture growth to be simulated without 
specifying nutrient uptake bounds; instead, the ME-Model 
predicts a maximum batch growth rate and optimal substrate 
uptake rate. 



Supporting the validity of the proteomic constraints limiting 
growth in Batch culture, optimal Batch growth rates, substrate 
uptake rates, and biomass yields correlate with experimental 
data for growth on different carbon sources (Supplementary 
Table S5). The ME-Model predicted substrate uptake and 
biomass yield closely matches laboratory evolved strains 
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(Pearson's r= 0.89 and r = 0.91, respectively) (Supplementary 
Table S5C, sensitivity analysis in Supplementary Table S6). 
Though less accurate, predicted growth rates by the ME-Model 
correlate with measured growth rates in batch culture better 
than standard M-Models, in which growth rate is maximized 
subject to a specified nutrient uptake, and the correlation 
increases when compared with laboratory evolved strains (M- 
Model Pearson's r=0.49, ME-Model Pearson's r=0.61) as 
opposed to wild-type strains (M-Model Pearson's r = 0.30, ME- 
Model Pearson's r = 0.39). Other methods that include various 
approximate constraints on the total flux through the meta- 
bolic network also show an increased performance in growth 
rate prediction, though all computational methods (Beg et al, 
2007; Adadi et al, 2012) still correlate better with each other 
than with the experimental data (Supplementary Table S5B). 

When the uptake of glucose is restricted below the amount 
required for optimal growth in batch culture, the cell's growth 
is carbon-limited. Growth rate linearly increases with glucose 
uptake when glucose availability is low. In this region (termed 
as the Strictly Nutrient-Limited (SNL) region in Figure 2 A) , the 
capabilities of the proteome are not fully utilized as the 
proteome could process more incoming glucose if it was 
available (Figures 1E-G). By varying the glucose availability, 
we find that a region exists in which the cell is both nutrient- 
and proteome- limited; we refer to this transition region as the 
Janusian region (Button, 1991). ME-Model computations thus 
reveal three distinct regions of microbial growth (Figure 2A; 
see Supplementary information, Optimization procedure, 
Computational definition, and identification of growth 
regions) . 

When the uptake of non-carbon sources is restricted below 
the amount required for optimal growth in batch culture, the 
cell's growth is limited by that nutrient. Unlike carbon-source 
limitation, we find the nutrient- and proteome-limited regions 
to be distinct (Figure 2B). However, in the SNL region, growth 
is sometimes non-linear as a function of uptake rate, due to 
changing biomass requirements (e.g., Sulfur and Magnesium) . 



Effect of proteome limitation on secretion 
phenotypes 

To understand the proteome-limited growth regions in the ME- 
Model, we first investigate trends in secretion phenotypes and 
biomass yield. Under glucose limitation, different metabolic 
pathways are utilized in the Janusian region than in the SNL 
region, resulting in acetate secretion (Figure 2C, red). This 
metabolic switch, combined with growth rate-dependent ATP 
requirements, results in a concave biomass yield as a function 
of growth rate (Figure 2D, red). Both the biomass yield and 
secretion trends have repeatedly been experimentally 
observed (Zhuang et al, 2011). 

The example of glucose limitation provides an illustrative 
example for the general behavior in the Janusian growth 
region. In the Janusian region, the cell increases its growth rate 
through differential expression of pathways, as illustrated in 
Figure 2E. Due to proteome limitations, the cell switches to 
pathways that require less protein mass but are lower in 
nutrient yield (defined as energy and/or biomass precursors 
produced per molecule of limiting nutrient consumed). This 
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behavior is in contrast to that in the SNL region, in which high- 
yield pathways are optimal (as in M-Models) and growth rate 
increases through changes in the effective catalytic rate of 
metabolic enzymes (Figure 1G). These results provide further 
support that 'overflow' metabolism can be understood in 
terms of proteomic constraints, as suggested with a small-scale 
model (Molenaar et al, 2009). 

The ME-Model also predicts that acetate will be secreted at 
all growth rates when E. coli is Nitrogen (Ammonium) -limited 
(Figure 2C, blue). Experimentally, acetate is secreted under 
nitrogen limitation even at low growth rates (Hua et al, 2004). 
This secretion phenotype is explained by the ME-Model as 
follows: protein 'saved' by utilizing low-yield carbon metabo- 
lism is diverted to synthesize other enzymes that are not 
operating at their maximal catalytic capacity. 

No Janusian region is observed under non-carbon limita- 
tion. In the ME-Model, this is likely due to reaction network 
topology — while there are many alternative pathways for 
energy, redox, and biomass precursor generation in carbon 
metabolism, non-carbon nutrient assimilation is often 
achieved using more linear pathways. As a result, there are 
fewer opportunities for trade-offs between uptake rate and 
biomass yield. However, perhaps including variable substrate 
affinities for alternative pathways would reveal Janusian 
regions corresponding to non-carbon limitations. 



Central carbon fluxes reflect growth optimization 
subject to catalytic constraints 

Further supporting the importance of proteomic constraints on 
metabolic phenotypes is the prediction of central carbon fluxes 
by the ME-Model. When glucose availability is varied, the ME- 
Model predicts changes in central carbon metabolism con- 
sistent with the changes from 13 C fluxomic data sets (Figure 3; 
Supplementary Figure S2, Pearson's r=0.93, 0.90, 0.86) 
(Nanchen et al, 2006; Schuetz et al, 2007, 2012). Importantly, 
the ME-Model predicts the dominant changes in pathway splits 
as the glucose availability is varied (Figure 3, insets). 

Previous studies have evaluated the ability of M-Models 
together with assumed optimality principles to predict meta- 
bolic fluxes (Schuetz et al, 2007, 2012). These studies 
concluded that no single objective function applied to 
M-Models can accurately represent fluxomic data from all 
environmental conditions studied (Schuetz et al, 2007). 
Instead, metabolic fluxes can be understood as being Pareto 
optimal: multiple objectives are simultaneously optimized and 
their relative importance varies depending on the environ- 
mental condition (Schuetz et al, 2012). The three objectives 
needed to explain most of the variations in the data from 
Schuetz et al were (1) maximum ATP yield, (2) maximum 
biomass yield, and (3) minimum sum of absolute fluxes 
(which is a proxy for minimum enzyme investment). These 
three objectives formed a Pareto optimal surface that was 
valuable for interpreting fluxomic data; however, the surface 
was large and it was not possible to predict the importance of 
each of the objectives a priori. 

By explicitly accounting for variable growth demands, 
enzyme expression, and constraints on enzymatic activity, 
the ME-Model eliminates the need for multiple objectives; 
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Figure 3 Central carbon metabolic flux patterns under glucose-limited and glucose-excess conditions. (A-C) Relative fluxes from 13 C experiments are plotted versus 
the fluxes predicted by the ME-Model. (A, B) Comparison of nutrient-limited model solutions with chemostat culture conditions and (C) comparison of the batch ME- 
Model solution with batch culture data. All simulations and experiments correspond to growth in glucose minimal media. Fluxes are normalized so that glucose uptake is 
100. Insets show the main flux changes under increasing glucose concentrations. The only model parameter that is modulated is the glucose uptake rate bound. Data 
were obtained from Nanchen ef a/(2006) and Schuetz et a/ (2007). The ME-Model flux for the reaction 'pyk' is taken to include phosphoenolpyruvate (PEP) to pyruvate 
(PYR) conversion via the phosphotransferase system (PTS). Flux splits shown as insets were computed using the ME-Model. The percentages indicate the percent 
carbon (Glucose) converted to C0 2 (for branch labeled TCA'), acetate, and biomass. Both the TCA and acetate branches contribute to ATP production. The total mmol 
ATP per gDW biomass produced is indicated. 



growth rate optimization alone is sufficient to predict the 
fluxes through central carbon metabolism (Figure 3; 
Supplementary Figure S2; Supplementary Table S7). The 
three original objectives chosen by Schuetz et al are 
biologically meaningful dimensions and required for inter- 
preting fluxomic data when using an M-Model. In contrast, 
the ME-Model accounts for all three of these dimensions 
implicitly during growth rate maximization without adjusting 
any model parameters (see Supplementary information and 
Supplementary Table S7). Accordingly, ME-Models can deter- 
mine, at least qualitatively, the importance and weighting of 
the objectives for growth in a given environment. Ultimately, 
the primary changes in flux through central carbon meta- 
bolism can be understood as responses to the same con- 
straints causing the observed relationship in biomass yield 
(Figure 2D) : at low growth rates under carbon limitation, the 
dominant changes are due to a changing ATP demand, and in 
the transition from carbon-limited to carbon-excess (pro- 
teome-limited) conditions, the primary changes are due to the 
switch to lower yield carbon catabolism (Figure 3, insets). 



In silico gene expression profiling from 
nutrient-limited to batch growth conditions 

We now use the ME-Model to predict groups of proteins that 
change in expression under various degrees of glucose 
limitation. Under glucose limitation, the optimal proteome 
changes due to shifting growth demands and proteomic 
constraints. The groups of functionally related proteins that 
shift in our simulations match those previously reported 
experimentally (Vemuri et al, 2006; Nahku et al, 2010), but the 
model predictions of quantitative differential expression (at 
the level of single genes) are weak. We separate the analysis of 
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the SNL region (Figure 4; Supplementary Table S8A) from the 
Janusian region (Figure 5; Supplementary Table S8B), due to 
the different dominant constraints and phenotypic responses 
specific to each region. 

In the SNL region, the expression of most proteins decreases 
as growth rate increases (Figure 4B, left side of tree, 
Supplementary Figure S3). The largest group of proteins 
includes those responsible for amino-acid and cell wall 
synthesis; the growth rate-dependent decrease in expression 
of these proteins is due to the combined effects of a decrease in 
cell wall and protein biomass (g/gDW) and an increase in the 
effective catalytic rate of enzymes (Figures 1E-G). Proteins 
involved in energy metabolism also decrease in expression 
with increasing growth rate due to changes in catalytic rate and 
growth rate-dependent demands. Surprisingly, the predicted 
expression levels of several accessory transcription proteins, 
including four stress-associated sigma factors (RpoS, RpoH, 
RpoE, and RpoN), are elevated at very low growth rates, 
reflecting an association with metabolic proteins needed for 
slow growth. 

A smaller number of proteins show increases in their 
relative expression levels at higher growth rates (Figure 4B, 
right side of tree, Supplementary Figure S3). These proteins 
include those responsible for protein synthesis (ribosome, 
RNAP, and accessory proteins such as elongation factors) and 
proteins involved in RNA biosynthesis. The increase in 
expression of RNA biosynthetic machinery is necessary for 
de novo synthesis of ribonucleotides and to ensure 
flux through nucleotide salvage pathways (mainly to 
support an increase in rRNA biomass) . Finally, the expression 
profile of the pentose phosphate pathway reflects the inter- 
play between the increasing demand for ribonucleotide 
precursors and the decreasing demand for amino-acid 
precursors. 
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Figure 4 Growth rate-dependent gene expression under glucose limitation. (A) Gene expression changes predicted by the ME-Model to occur in the Strictly Nutrient- 
Limited (SNL) growth region indicated in light blue under glucose limitation in minimal media are analyzed. (B) ME-Model-computed relative gene-enzyme pair 
expression is plotted as a function of growth rate; the normalized in silico expression profiles are clustered hierarchically (see Materials and methods). Solid lines are 
expression profiles of individual gene-enzyme pairs and dotted black lines are the centroid of each cluster. Each leaf node is colored and qualitatively labeled by function. 
The number of genes in each leaf node is indicated and listed in Supplementary Table S8A. Asterisks indicate clusters with monotonic expression changes that 
significantly match the directionality observed in expression data (Wilcoxon signed-rank test, P<1 x 10 ~ 4 ). Expression data were obtained from a previous study 
(Nahku etal, 2010), in which E. coli was cultivated in a chemostat at dilution rates 0.3 h" 1 and ~0.5h" 1 . 



To validate our predicted expression changes, we 
compared gene clusters with expression data from E. coli 
grown at 0.3 h ~ 1 and ~ 0.5 h ~ 1 in a glucose-limited chemostat 
(Nahku et al, 2010). In this data set, genes in Energy 
Metabolism (purple), Core Expression Machinery (orange), 
and RNA Biosynthesis (red) all significantly change in the 
predicted direction (Wilcoxon signed-rank test, P< \ x 10 ~ 4 ), 
supporting our predicted expression profiles. The other 
clusters showed no significant changes in the data set; 
these clusters are either small in size or do not change 
monotonically, hindering direct comparison with this 
data set. The ME-Model is not yet predictive of quantitative 
gene expression changes (at the level of single genes); 
the correlation over the entire data set is statistically 
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significant (P<0.005), but weak (Pearson's r=0.14). Our 
approach is at present limited to qualitative predictions of 
the direction of change of small groups of functionally related 
proteins. 

In the Janusian region of growth (Figure 5), the cell 
transitions from carbon-limited to proteome-limited con- 
straints, resulting in a distinct transcriptional response. At 
the beginning of this transition, the cell has reached a nutrient 
level where enzymes are saturated (Figure 1G); as growth rate 
increases, the total demand of anabolic processes increases, 
causing a global increase in the bulk of metabolism and gene 
expression machinery (Figure 5B). To meet these proteome 
demands, energy metabolism is altered to favor lower yield 
catabolic pathways that require less protein (so that the 
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Figure 5 Gene expression during the Janusian region. (A) Gene expression changes predicted by the ME-Model to occur in the Janusian growth region indicated in 
purple under glucose limitation in minimal media are analyzed. (B) Simulated expression profiles are clustered using signed power (p = 25) correlation similarity and 
average agglomeration. A freely available R package was used (Langfelder and Horvath, 2008). Eleven clusters resulted. Two small clusters were removed because 
they represented stochastic expression of alternative isozymes. The first principal component of the remaining nine clusters is displayed and grouped qualitatively by 
function. (C) Many of the expression modules correspond to genes of central carbon energy metabolism. Reactions are colored according to the module color in (B). 



protein can instead be used for anabolic processes); this is 
accomplished through a decrease in TCA Cycle and Oxidative 
Phosphorylation expression in favor of a transient increase 
in the Glyoxylate Cycle followed by a large increase in 
Glycolysis and acetate secretion (Figures 5B and C), consistent 
with previously observed changes in gene expression in 
the transition to glucose-excess environments (Vemuri et al, 
2006). 

The ME-Model predicts intricate expression changes as 
glucose availability changes by employing relatively simple 
constraints on molecular catalysis and biomass composition. 
This study is the first to attempt genome-scale prediction of 
gene expression levels under changing growth rate and/or 
nutrient limitation from optimality principles alone. Syste- 
matic consideration of transcriptional regulation and inclusion 
of missing constraints and parameters impacting optimality 
(e.g., kinetic constraints and parameters) are future endeavors 
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necessary to extend the predictive power to the level of single 
genes (see Discussion). 



Discussion 

The ME-Model is a microbial growth model that computes the 
optimal cellular state for growth in a given steady-state 
environment. It takes as input the availability of nutrients to 
the cell and produces experimentally testable predictions for: 
(1) the cell's maximum growth rate (u*) in the specified 
environment, (2) substrate uptake/by-product secretion rates 
at u*, (3) metabolic fluxes at u*, and (4) gene product 
expression levels at u*. 

Important to the predictions of the ME-Model is the proper 
coupling between metabolism and gene product expression. 
Through comparison of model simulations with experimental 
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data, we derived two general classes of molecular efficiencies 
that vary based on the growth rate and the degree of nutrient 
limitation. For ribosomes (and tRNA and mRNA), we propose 
a growth rate-dependent Michaelis-Menten-type model for 
polymerization speed, which has preliminary experimental 
evidence (Proshkin et al, 2010), though we have not seen it 
previously proposed. We furthermore show that two simple 
assumptions allow us to approximate the effect of nutrient 
limitation on metabolic enzyme catalysis. While enzyme- 
specific trends in catalytic rates depend on the limiting nutrient 
(Bennett et al, 2009; Boer et al, 2010), our formulation is a first 
step toward modeling genome-scale effects of nutrient limita- 
tion and suggests that simple principles may underlie these 
trends. Both of these molecular efficiency variables are 
essential for genome-scale modeling of gene expression and 
warrant future studies to validate and refine them further. 
Paired proteomic and metabolomic data sets under nutrient- 
limited conditions will allow for a deeper understanding of 
nutrient limitation-dependent effective catalytic rates, and 
new data sets (Li et al, 2012) and models (Taller et al, 2010) on 
the processes of gene expression can help to refine model 
parameters and determine their genome-scale effects. 

The proteomic constraints inherent to the ME-Model result 
in qualitatively different growth predictions compared with 
previous genome-scale models. In the ME-Model, growth rate 
is not a simple linear function of substrate uptake bounds; 
instead, the ME-Model predicts a maximal growth rate and 
optimal substrate uptake rates, which better reflects empirical 
growth models and better predicts experimentally measured 
growth rates and substrate uptake rates. The ME-Model 
reveals three distinct growth regions, which we term SNL, 
Janusian, and Batch; while nutrient-limited (chemostat 
culture) and nutrient-excess (batch culture) conditions are 
commonplace, the Janusian region (where the cell is limited by 
both nutrient availability and proteome capacity) is rarely 
considered in microbiology. Interestingly, we observe the 
Janusian region to occur under carbon limitation but not under 
various non-carbon limitations. We take this to mean that 
Janusian regions may exist for non-carbon limitations, but the 
constraints that may cause them to arise are outside the scope 
of the current ME-Model. 

The proteomic constraints in the ME-Model also improve 
predictions of by-product secretion and metabolic flux under 
both nutrient-excess and nutrient-limited conditions. By 
accounting for the metabolic cost of proteins and limitations 
of protein production capacity, the ME-Model accurately 
decouples substrate uptake, growth rate, and growth yield, 
allowing for important rate-yield trade-offs to be predicted. In 
particular, we show that seemingly inefficient metabolism in 
batch culture and under nitrogen limitation (both when 
carbon is in excess), can be explained and predicted through 
proteomic trade-offs. This capability rectifies the dominant 
failure mode in predicting metabolic flux previously reported 
for M-Models (Schuetz et al, 2012), and suggests that a single 
objective of growth rate (if the proper constraints are included) 
may be able to predict metabolic fluxes. This result shows that 
proteomic constraints are necessary to accurately predict 
metabolic responses — optimal growth and metabolic pheno- 
types cannot be fully understood without taking gene 
expression into account. From a practical standpoint, the 
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natural parsimony present in ME-Model simulations (Lerman 
et al, 2012) strongly reduces the optimal solution space, 
allowing for more precise predictions, an important feature in 
diverse applications. The effect of proteomic constraints on 
secretion phenotypes is of particular importance for applica- 
tions in systems metabolic engineering, and will be necessary 
for simulating behavior in complex media and predicting 
nutrient preferences. 

At the level of gene expression, the ME-Model predicts 
detailed behavior in each growth region. In the SNL and 
Janusian growth regions, gene modules have distinct nutrient 
limitation-dependent profiles. A number of the gene modules 
change in the correctly predicted direction compared with 
expression data from E. coli in a chemostat at different growth 
rates (Vemuri et al, 2006; Nahku et al, 2010), supporting our 
predicted expression profiles. By predicting optimal gene 
expression profiles, the ME-Model aids in understanding the 
factors shaping the evolution of gene expression patterns (e.g., 
proteomic constraints and changing biomass composition). 

Modeling optimal transcriptional responses is complemen- 
tary to the elucidation and modeling of specific regulatory 
mechanisms (Klumpp et al, 2009; Cho et al, 2011; 
Berthoumieux et al, 2013). It is tempting to relate the 
expression profiles predicted by the ME-Model to molecular 
mechanisms underlying the control gene expression in vivo 
(Klumpp et al, 2009; Berthoumieux et al, 2013; Gerosa et al, 
2013). For example, constitutively expressed genes display 
growth rate-dependent expression trends (Klumpp and Hwa, 
2008; Klumpp et al, 2009), which might provide the cell with 
an economical way of responding to global changes in 
metabolic efficiency (Valgepea et al, 2013). Also, PurR could 
be responsible for regulating the increase in expression of 
nucleotide biosynthesis genes at higher growth rates (as PurR 
is an autorepressor, this could be accomplished through 
mechanisms described in Klumpp et al, 2009) . Finally, though 
the primary role of ArcA is to respond oxygen availability (Cho 
et al, 2006), it also represses many of the genes in the TCA 
cycle and Oxidative Phosphorylation that decrease during 
the glucose-limited to glucose-excess (Janusian) transition 
(Vemuri et al, 2006; Haverkorn van Rijsewijk et al, 2011). 
However, as regulatory mechanisms are not explicitly con- 
sidered in the ME-Model, the relation between regulatory 
mechanisms and simulated expression profiles is indirect; 
while this comparison can assist in explaining and expanding 
upon the functional roles of cellular regulators, much further 
work is required to validate the resulting hypotheses. 

As it is an optimality model, the ME-Model is particularly 
suited for studies related to adaptive laboratory evolution 
(ALE) . Recently, it was reported that it is not possible to predict 
some changes that occur during ALE in Batch culture using an 
M-Model (Harcombe et al, 2013). This is because M-Models 
only take biomass yield optimization into account; these 
results are consistent with the rate-yield trade-offs present in 
the ME-Model under nutrient-excess conditions. In the ME- 
Model, a number of inherent factors can limit cellular growth 
(e.g., translation rate and metabolic catalysis); the ME-Model 
can thus provide alternative hypotheses for the mechanisms of 
growth increase and aid in understanding the results of ALE. 

The ME-Model can simulate coarse- to fine-grained cellular 
and molecular phenotypes with an improved accuracy and 
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scope compared with previous genome-scale models. The ME- 
Model shows complex behavior as a result of linear constraints 
applied to an integrated network. The ME-Model thus shows 
that intricate and seemingly unintuitive phenotypes can be 
modeled at a genome-scale with simple enough assumptions 
to understand their underlying cause. Due to the richness of 
the model simulations, we primarily focused on E. coli growing 
in glucose minimal media at different growth rates by 
modulating the availability of glucose; there are therefore 
many future opportunities to investigate model predictions 
under many environmental and genetic conditions. 

A whole-cell E. coli model has been desired for some time 
(Crick, 1973) as such a model would have profound impacts 
for basic microbiology, the study of microbial communities, 
antibiotic discovery, the elucidation of regulatory networks, 
and systems metabolic engineering. We hope the ME-Model 
will serve as a scaffold for continued model development 
toward these practical applications. 

Materials and methods 

Network reconstruction 

The two primary reaction networks used to create the ME-Model were 
the most recent metabolic reconstruction (Orth et al, 2011), and a 
network detailing the reactions of gene expression and functional 
enzyme synthesis (Thiele et al, 2009). The gene expression recon- 
struction is formalized as a set of 'template reactions' that can be 
applied to different components (e.g., gene, peptide, and set of peptides) 
to generate balanced reactions. Merging the E. coli metabolic network 
reconstruction with the gene expression reconstruction required a 
conversion of the Boolean Gene-Protein-Reaction associations (GPRs) 
into protein complexes. We utilized EcoCyc's annotation to map gene 
sets to functional enzyme complexes. The content of the final 
reconstruction is detailed in Supplementary Tables SI, S9, and S10. 

Coupling constraint formulation and imposition 

Coupling constraints provide a mechanism for linking the flux values 
of one or more reactions in the ME-Model. For example, they were used 
to bound the number of proteins that may be translated from an mRNA 
before the mRNA decays or is transmitted to a daughter cell. They are 
also the mechanism through which we related enzyme abundance and 
activity. Often, the coupling constraints are a function of the 
organism's growth rate (n). The coupling constraints are a set of 
inequality constraints appended to the stoichiometric matrix as 
additional rows. Assumptions and literature citations for all para- 
meters used can be found in Supplementary information. 

Optimization procedure 

As the demand reactions and coupling constraints are functions of the 
organism's growth rate (|i), growth-rate optimization is not a linear 
program (LP) as in metabolic models, which rely on a linear biomass 
objective function. Instead, to optimize for growth rate, we solve a 
sequence of LPs to search for the maximum growth rate, [i*, that still 
results in a feasible LP. This search for |i* is accomplished through a 
binary search; the search procedure is slightly different depending on 
whether the cell is proteome-limited (Janusian and Batch growth 
modes) or SNL. Detailed traces of the execution of the optimization 
procedures can be found in Supplementary information. 

Hierarchical clustering 

For Figure 4B, relative fractional proteome mass was calculated for 
each gene-enzyme pair. If a gene is present in multiple enzyme 



complexes, then it is represented twice, and all subunits of an enzyme 
complex are counted separately. To filter out the stochastic expression 
of alternative isozymes (to make the observed trends clear), we 
eliminated gene-enzyme pairs that were not expressed across all 
growth rates and filtered gene-enzyme pairs that changed in relative 
expression by >0.3 across more than one pair of consecutive growth 
rates. Hierarchical clustering was performed on the resulting expres- 
sion profiles; we used a signed power (P = 6) correlation similarity (as 
in Langfelder and Horvath, 2008) and average agglomeration. 

File formats and accessibility 

The model is freely available as part of the openCOBRA Project 
(http://opencobra.sourceforge.net). 

Supplementary information 

Supplementary information is available at the Molecular Systems 
Biology website (www.nature.com/msb). 
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